Feat: Add starrocks catalog support#8856
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
Adds first-class StarRocks support to marimo’s SQL datasource discovery by introducing a StarRocksEngine that can enumerate catalogs and introspect external-catalog objects via SHOW .../DESC SQL when SQLAlchemy’s inspector can’t.
Changes:
- Implement
StarRocksEngine(SQLAlchemyEngine)with multi-catalog discovery and external-catalog fallbacks. - Extend SQL identifier quoting to support the
starrocksdialect (backtick style) and add related tests. - Register the new engine in engine detection and add StarRocks optional test dependency / dependency manager entry.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
marimo/_sql/engines/starrocks.py |
New engine implementing StarRocks catalog/database/table discovery with SQL fallbacks for external catalogs. |
marimo/_sql/get_engines.py |
Adds StarRocksEngine to SUPPORTED_ENGINES so it’s selected before generic SQLAlchemyEngine. |
marimo/_sql/sql_quoting.py |
Adds starrocks to backtick-quoting dialects. |
marimo/_sql/sql.py |
Updates unsupported-engine error message to mention StarRocks. |
marimo/_dependencies/dependencies.py |
Adds DependencyManager.starrocks. |
pyproject.toml |
Adds starrocks to test-optional and bans module-level starrocks imports. |
tests/_sql/test_starrocks.py |
New unit tests for StarRocks engine behavior using mocks. |
tests/_sql/test_sql_quoting.py |
Adds StarRocks quoting/roundtrip cases and qualified-name quoting case. |
tests/_sql/test_get_engines.py |
Adds StarRocks engine selection test and datasource connection mapping assertion. |
Bundle ReportChanges will increase total bundle size by 559 bytes (0.0%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: marimo-esmAssets Changed:
|
|
hi @chris-celerdata, this sounds similar to an issue with other multi-db engines like Snowflake. There is a start of the fix here #8824 Would you mind having a look and seeing if that direction would fix this problem universally, rather than custom support for Starrocks? If not, we can also have a look (wish there was Starrocks cloud haha) |
|
There is a StarRocks cloud. It can also be deployed on Docker. If you want to work with external catalogs, the documentation can be found here. However, the I took a look at the other PR and it does not fix the problem; Note: I considered doing something similar to dialect_queries = {
"postgresql": "SELECT current_database()",
"mssql": "SELECT DB_NAME()",
"timeplus": "SELECT current_database()",
}This approach could be duplicated into |
|
I think that's the eventual direction. That PR decouples get_schema, so eventually we can call get_databases and get all databases connected. I'm okay to have this too, but there'll be some code drift once we merge that. In the meantime, would you want to create the frontend UI changes for adding a starrocks connection? |
|
I'd love to add StarRocks to the UI. I was referencing #8169, which talked about keeping the UI uncluttered. Has that changed? |
|
@chris-celerdata , yeah. But I'm checking with the team on if we're okay to add to the UI. The PR referenced #8824 has been merged, does this change this PR? Sorry for the back and forth on this |
|
I have pushed a commit with changes based on the merge. Any comments would be appreciated! |
|
Thank you! This is likely to get merged in soon, which is closer to global solution we talked about. |
|
Hey @chris-celerdata! We're good to add StarRocks in the UI if you're still down. |
for more information, see https://pre-commit.ci
|
I have merged the main branch into this branch and made the changes to |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
marimo/_sql/engines/sqlalchemy.py:170
_get_inspectorexecutesSET CATALOG ...on a pooled connection but never restores the previous catalog. That session-level state can leak to later operations that reuse the same DBAPI connection (e.g., queries/introspection happening after discovery may run against the last catalog visited). Consider capturing the current catalog at entry and restoring it in afinallyblock, or otherwise resetting connection state before returning it to the pool.
from sqlalchemy import inspect, text
_use_database_dialect_command: dict[str, str] = {
"snowflake": f"USE DATABASE {self._quote_identifier(database)}",
"starrocks": f"SET CATALOG {self._quote_identifier(database)}",
}
dialect_command = _use_database_dialect_command.get(self.dialect)
if dialect_command is not None:
with self._connection.connect() as connection:
connection.execute(text(dialect_command))
| database_name: Optional[str] = None | ||
| dialect_queries = { | ||
| "postgresql": "SELECT current_database()", | ||
| "mssql": "SELECT DB_NAME()", | ||
| "timeplus": "SELECT current_database()", | ||
| "starrocks": "SELECT CATALOG()", | ||
| } |
There was a problem hiding this comment.
For StarRocks, get_default_database() adds a SELECT CATALOG() fallback, but earlier in the method it returns self._connection.url.database when present. On StarRocks URLs, that value is typically the database/schema, not the catalog, which can make default_database inconsistent with _get_database_names() (catalog list) and break default highlighting/qualification in the UI. Consider skipping the URL-database fast path when dialect == 'starrocks' and always resolving the default catalog via CATALOG() (leaving the URL database to default_schema).
| return [str(row[0]) for row in result.fetchall()] | ||
|
|
There was a problem hiding this comment.
_get_starrocks_database_names() assumes the catalog name is always in row[0]. To make this more robust across dialect/driver variations, consider selecting the column by name (using result.keys() like the Snowflake implementation) and raising a clear error if the expected column is missing.
| return [str(row[0]) for row in result.fetchall()] | |
| columns = list(result.keys()) | |
| # StarRocks generally returns a "Catalog" column for SHOW CATALOGS, | |
| # but we defensively support a few plausible variants. | |
| candidate_columns = ("Catalog", "catalog_name", "name") | |
| name_col_index: Optional[int] = None | |
| for col_name in candidate_columns: | |
| if col_name in columns: | |
| name_col_index = columns.index(col_name) | |
| break | |
| if name_col_index is None: | |
| raise RuntimeError( | |
| "Unexpected SHOW CATALOGS result: expected one of " | |
| f"{candidate_columns!r} columns, but got {columns!r}" | |
| ) | |
| return [str(row[name_col_index]) for row in result.fetchall()] |
| @@ -361,8 +376,11 @@ | |||
| Returns a single-element list with the default database when | |||
| the dialect has no dedicated discovery mechanism. | |||
| """ | |||
| if self.dialect.lower() == "snowflake": | |||
| dialect = self.dialect.lower() | |||
| if dialect == "snowflake": | |||
| return self._get_snowflake_database_names() | |||
| if dialect == "starrocks": | |||
| return self._get_starrocks_database_names() | |||
There was a problem hiding this comment.
The StarRocks-specific paths (SET CATALOG in _get_inspector, SHOW CATALOGS discovery, and the starrocks branch in _get_database_names) aren’t covered by existing SQLAlchemyEngine tests. Since this file already has unit tests (including Snowflake mocking), consider adding a mocked StarRocks test to assert the emitted SQL and that the discovery branch is exercised.
| def _get_starrocks_database_names(self) -> list[str]: | ||
| """Get catalog names for StarRocks via 'SHOW CATALOGS'. | ||
|
|
||
| StarRocks uses a three-level hierarchy (Catalog → Database → Table) | ||
| which maps to marimo's (Database → Schema → Table). | ||
| """ |
There was a problem hiding this comment.
PR description says it “introduces a new engine which extends SQLAlchemyEngine”, but there doesn’t appear to be a StarRocksEngine implementation in the codebase; instead the StarRocks behavior is implemented directly in SQLAlchemyEngine. Please either update the PR description to match the implementation or add the intended engine subclass + registration (if that’s still the plan).
| boto3 = Dependency("boto3") | ||
|
|
||
| redshift_connector = Dependency("redshift_connector") | ||
| starrocks = Dependency("starrocks") |
There was a problem hiding this comment.
DependencyManager.starrocks is introduced but appears unused (no references in the repo). If StarRocks isn’t being explicitly dependency-gated anywhere, consider removing this entry; otherwise, add the corresponding require_many/feature checks so this dependency record has an effect.
| starrocks = Dependency("starrocks") |
| "clickhouse-connect>=0.8.18", | ||
| "redshift-connector[full]>=2.1.7", | ||
| # For testing starrocks | ||
| "starrocks>=1.3.0", |
There was a problem hiding this comment.
starrocks>=1.3.0 is added to the test-optional dependency group, but the current test changes don’t import or require the StarRocks package (quoting tests are pure string logic). If there aren’t additional StarRocks tests elsewhere in the PR, consider removing this optional dependency until it’s needed, to keep the optional test environment lighter.
| "starrocks>=1.3.0", |
| else: | ||
| raise ValueError( | ||
| "Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift or DBAPI 2.0 compatible engine." | ||
| "Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift, StarRocks or DBAPI 2.0 compatible engine." |
There was a problem hiding this comment.
The updated error message lists StarRocks as a separate supported engine type, but StarRocks connections are still surfaced via the generic SQLAlchemy engine in SUPPORTED_ENGINES. Consider rewording to avoid implying a distinct engine class (e.g., “SQLAlchemy (including StarRocks dialect)” or similar).
| "Unsupported engine. Must be a SQLAlchemy, Ibis, Clickhouse, DuckDB, Redshift, StarRocks or DBAPI 2.0 compatible engine." | |
| "Unsupported engine. Must be a SQLAlchemy (including StarRocks dialect), Ibis, Clickhouse, DuckDB, Redshift, or DBAPI 2.0 compatible engine." |
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.22.1-dev22 |

📝 Summary
Implements #8765.
🔍 Description of Changes
This PR introduces a new engine which extends
SQLAlchemyEngine. When possible (e.g. working withdefault_catalog), the inherited methods are used; otherwise special "external" methods are used.Primary overridden functions:
get_databasesget_tables_in_schemaget_table_detailsBefore

After

Notes
Integration testing against a real StarRocks instance is not present. There is no "in-memory" equivalent like what
DuckDBandSQLitehave.Arrays of variable types default greedily to their inner types. There is currently no support for showing complex types with different icons from a standard string (i.e.
ARRAY<INT>as not simply an integer). I'm not sure if this is relevant, but I thought I would mention it anyway.📋 Checklist